{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## AutoML Sweepable API\n",
    "\n",
    "This Notebook shows how to use `Sweepable` API to fully customize the pipeline or search space in your AutoML task. In this notebook, you will learn\n",
    "- use `AutoTrainer` to simplify your work.\n",
    "- how to use `AutoML().CreateSweepableEstimator` to create `SweepableEstimator`.\n",
    "- how to create `SweepablePipeline` for multiple trainer candidates.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install Nuget packages and add using statement"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "// using nightly-build\n",
    "#i \"nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json\"\n",
    "#r \"nuget: Plotly.NET.Interactive, 3.0.2\"\n",
    "#r \"nuget: Plotly.NET.CSharp, 0.0.1\"\n",
    "#r \"nuget: Microsoft.ML.AutoML, 0.20.0-preview.22470.1\"\n",
    "#r \"nuget: Microsoft.Data.Analysis, 0.20.0-preview.22470.1\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "using static Microsoft.DotNet.Interactive.Formatting.PocketViewTags;\n",
    "using Microsoft.Data.Analysis;\n",
    "using System;\n",
    "using System.IO;\n",
    "using Microsoft.ML;\n",
    "using Microsoft.ML.AutoML;\n",
    "using Microsoft.ML.AutoML.CodeGen;\n",
    "using Microsoft.ML.Trainers.LightGbm;\n",
    "using Microsoft.ML.Data;\n",
    "using Plotly.NET;\n",
    "using Microsoft.ML.Transforms.TimeSeries;\n",
    "using Microsoft.ML.SearchSpace;\n",
    "using System.Diagnostics;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use `AutoTrainer` for built-in sweepable estimator candidates.\n",
    "\n",
    "`AutoTrainer` provides built-in sweepable estimator candidates for binary-classification, multi-class classification and regression. For those scenarios, you can simply use those candidates instead of creating `SweepableEstimator` from scratch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "var context = new MLContext();\n",
    "var regressionTrainerCandidates = context.Auto().Regression();\n",
    "var binaryClassificationTrainerCandidates = context.Auto().BinaryClassification();\n",
    "var multiclassClassificationTrainerCandidates = context.Auto().MultiClassification();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    }
   },
   "source": [
    "#### Use `AutoML().CreateSweepableEstimator` to create `SweepableEstimator`\n",
    "\n",
    "In case the built-in `SweepableEstimator` doesn't satisfy your requirement, you can call `CreateSweepableEstimator` to create a customized `SweepableEstimator`. A `SweepableEstimator` is nothing different than a normal `Estimator` plus `SearchSpace`. The following code shows how to create a sweepable `LightGbm` and `SDCA`.\n",
    "\n",
    "For simplicity, the built-in search space for `LightGbm` and `SDCA` is used but you can fully customize the search space however way you want. For more details on how to do that, please check [Parameter And SearchSpace](./Parameter%20and%20SearchSpace.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "var lgbmSearchSpace = new SearchSpace<LgbmOption>();\n",
    "var sweepableLgbm = context.Auto().CreateSweepableEstimator((context, param) => {\n",
    "    var option = new LightGbmRegressionTrainer.Options()\n",
    "    {\n",
    "        NumberOfLeaves = param.NumberOfLeaves,\n",
    "        NumberOfIterations = param.NumberOfTrees,\n",
    "        MinimumExampleCountPerLeaf = param.MinimumExampleCountPerLeaf,\n",
    "        LearningRate = param.LearningRate,\n",
    "        LabelColumnName = \"Label\",\n",
    "        FeatureColumnName = \"Features\",\n",
    "        Booster = new GradientBooster.Options()\n",
    "        {\n",
    "            SubsampleFraction = param.SubsampleFraction,\n",
    "            FeatureFraction = param.FeatureFraction,\n",
    "            L1Regularization = param.L1Regularization,\n",
    "            L2Regularization = param.L2Regularization,\n",
    "        },\n",
    "        MaximumBinCountPerFeature = param.MaximumBinCountPerFeature,\n",
    "    };\n",
    "\n",
    "    return context.Regression.Trainers.LightGbm(option);\n",
    "}, lgbmSearchSpace);\n",
    "\n",
    "var sdcaSearchSpace = new SearchSpace<SdcaOption>();\n",
    "var sweepableSdca = context.Auto().CreateSweepableEstimator((context, param) => {\n",
    "    return context.Regression.Trainers.Sdca(\"Label\", \"Features\", l1Regularization: param.L1Regularization, l2Regularization: param.L2Regularization);\n",
    "}, sdcaSearchSpace);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create `SweepablePipeline` with multiple trainer candidates.\n",
    "\n",
    "`SweepablePipeline` allows you to put multiple estimators as candidates to a certain stage. During AutoML sweeping, these candidates will be evaluated seperatly and the one with best metric will be picked. Note that the estimator doesn't necessarily need to be a trainer, it can be a trainer, transformer or even a `SweepablePipeline`, as long as they all have the same input and output schema.\n",
    "\n",
    "The following code shows how to create a `SweepablePipeline` with `sweepableSdca` and `sweepableLgbm` we created above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "var sweepablePipeline = context.Transforms.Concatenate(\"Features\", \"X1\", \"X2\")\n",
    "                            .Append(sweepableSdca, sweepableLgbm);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Config `AutoMLExperiment` using `sweepablePipeline`\n",
    "In the next step, we are going to train `sweepablePipeline` on a generated non-linear dataset using `AutoMLExperiment`, which will sweeping both `sdca` and `lightGbm` on configured search space. Considering that `sdca` is a linear classifier, the winning model should be `lightGbm`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "var rand = new Random(0);\n",
    "var context =new MLContext(seed: 1);\n",
    "var x1 = Enumerable.Range(0, 1000).Select(_x => rand.NextSingle() * 100).ToArray();\n",
    "var x2 = x1.Select(_x => rand.NextSingle() * 100).ToArray();\n",
    "var y = Enumerable.Zip(x1, x2).Select(_x => _x.Second * _x.First + (rand.NextSingle() - 0.5f) * 10).ToArray();\n",
    "var df = new DataFrame();\n",
    "df[\"X1\"] = DataFrameColumn.Create(\"X1\", x1);\n",
    "df[\"X2\"] = DataFrameColumn.Create(\"X2\", x2);\n",
    "df[\"Label\"] = DataFrameColumn.Create(\"Label\", y);\n",
    "var trainTestSplit = context.Data.TrainTestSplit(df);\n",
    "df.Head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "var monitor = new NotebookMonitor(sweepablePipeline);\n",
    "var experiment = context.Auto().CreateExperiment();\n",
    "experiment.SetDataset(df, 5)\n",
    "          .SetPipeline(sweepablePipeline)\n",
    "          .SetTrainingTimeInSeconds(50)\n",
    "          .SetRegressionMetric(RegressionMetric.RootMeanSquaredError)\n",
    "          .SetMonitor(monitor);\n",
    "\n",
    "// Configure Visualizer\t\t\t\n",
    "monitor.SetUpdate(monitor.Display());\n",
    "\n",
    "var res = await experiment.RunAsync();\n",
    "\n",
    "// check the type of last trainer for winning model, which should be lightGbm\n",
    "(res.Model as TransformerChain<ITransformer>).Last().GetType()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Continue learning\n",
    "\n",
    "> [⏩ Next Module - AutoML Tuner](./06-AutoML%20HPO%20and%20tuner.ipynb)  \n",
    "> [⏪ Last Module - AutoML Sweepable API](./05%20-%20AutoML%20Sweepable%20API.ipynb)\n",
    "\n",
    "## See also\n",
    "- [Training and AutoML](./03-Training%20and%20AutoML.ipynb)\n",
    "- [Regression with Taxi Dataset](./E2E-Regression%20with%20Taxi%20Dataset.ipynb)\n",
    "- [Classification with Iris Dataset](./E2E-Classification%20with%20Iris%20Dataset.ipynb)\n",
    "- [Kaggle with Titanic Dataset](./REF-Kaggle%20with%20Titanic%20Dataset.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".NET (C#)",
   "language": "C#",
   "name": ".net-csharp"
  },
  "language_info": {
   "file_extension": ".cs",
   "mimetype": "text/x-csharp",
   "name": "C#",
   "pygments_lexer": "csharp",
   "version": "9.0"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
